M ar 2 01 3 Better subset regression
نویسنده
چکیده
To find efficient screening methods for high dimensional linear regression models, this paper studies the relationship between model fitting and screening performance. Under a sparsity assumption, we show that a subset that includes the true submodel always yields smaller residual sum of squares (i.e., has better model fitting) than all that do not in a general asymptotic setting. This indicates that, for screening important variables, we could follow a “better fitting, better screening” rule, i.e., pick a “better” subset that has better model fitting. To seek such a better subset, we consider the optimization problem associated with best subset regression. An EM algorithm, called orthogonalizing subset screening, and its accelerating version are proposed for searching for the best subset. Although the two algorithms cannot guarantee that a subset they yield is the best, their monotonicity property makes the subset have better model fitting than initial subsets generated by popular screening methods, and thus the subset can have better screening performance asymptotically. Simulation results show that our methods are very competitive in high dimensional variable screening even for finite sample sizes.
منابع مشابه
Fast Bayesian Feature Selection for High Dimensional Linear Regression in Genomics via the Ising Approximation
Feature selection, identifying a subset of variables that are relevant for predicting a response, is an important and challenging component of many methods in statistics and machine learning. Feature selection is especially difficult and computationally intensive when the number of variables approaches or exceeds the number of samples, as is often the case for many genomic datasets. Here, we in...
متن کاملCorrigendum to "Ensembling neural networks: Many could be better than all" [Artificial Intelligence 137 (1-2) (2002) 239-263]
In 2002, we published in Artificial Intelligence an extension [1] of a paper we presented at IJCAI-01 [2]. In Section 2 of the IJCAI-01 paper [2] and in Section 2.1 of the AIJ paper [1], we presented a criterion for selecting a subset of an ensemble of neural networks that could yield better performance than using all members of the ensemble for regression. The fundamental motivation for this c...
متن کاملar X iv : 1 30 9 . 25 45 v 2 [ m at h . O C ] 1 M ar 2 01 4 Forbidden vertices
In this work, we introduce and study the forbidden-vertices problem. Given a polytope P and a subset X of its vertices, we study the complexity of linear optimization over the subset of vertices of P that are not contained in X . This problem is closely related to finding the k-best basic solutions to a linear problem. We show that the complexity of the problem changes significantly depending o...
متن کاملM ar 2 01 0 KAKEYA - TYPE SETS IN FINITE VECTOR SPACES
For a finite vector space V and a non-negative integer r ≤ dim V we estimate the smallest possible size of a subset of V , containing a translate of every rdimensional subspace. In particular, we show that if K ⊆ V is the smallest subset with this property, n denotes the dimension of V , and q is the size of the underlying field, then for r bounded and r < n ≤ rq we have |V \K| = Θ(nq); this im...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013